965 research outputs found

    Unsupervised Pretraining of Neural Networks with Multiple Targets using Siamese Architectures

    Get PDF
    A model's response for a given input pattern depends on the seen patterns in the training data. The larger the amount of training data, the more likely edge cases are covered during training. However, the more complex input patterns are, the larger the model has to be. For very simple use cases, a relatively small model can achieve very high test accuracy in a matter of minutes. On the other hand, a large model has to be trained for multiple days. The actual time to develop a model of that size can be considered to be even greater since often many different architecture types and hyper-parameter configurations have to be tried. An extreme case for a large model is the recently released GPT-3 model. This model consists of 175 billion parameters and was trained using 45 terabytes of text data. The model was trained to generate text and is able to write news articles and source code based only on a rough description. However, a model like this is only creatable for researchers with access to special hardware or immense amounts of data. Thus, it is desirable to find less resource-intensive training approaches to enable other researchers to create well performing models. This thesis investigates the use of pre-trained models. If a model has been trained on one dataset and is then trained on another similar data, it faster learns to adjust to similar patterns than a model that has not yet seen any of the task's pattern. Thus, the learned lessons from one training are transferred to another task. During pre-training, the model is trained to solve a specific task like predicting the next word in a sequence or first encoding an input image before decoding it. Such models contain an encoder and a decoder part. When transferring that model to another task, parts of the model's layers will be removed. As a result, having to discard fewer weights results in faster training since less time has to be spent on training parts of a model that are only needed to solve an auxiliary task. Throughout this thesis, the concept of siamese architectures will be discussed since when using that architecture, no parameters have to be discarded when transferring a model trained with that approach onto another task. Thus, the siamese pre-training approach positively impacts the need for resources like time and energy use and drives the development of new models in the direction of Green AI. The models trained with this approach will be evaluated by comparing them to models trained with other pre-training approaches as well as large existing models. It will be shown that the models trained for the tasks in this thesis perform as good as externally pre-trained models, given the right choice of data and training targets: It will be shown that the number and type of training targets during pre-training impacts a model's performance on transfer learning tasks. The use cases presented in this thesis cover different data from different domains to show that the siamese training approach is widely applicable. Consequently, researchers are motivated to create their own pre-trained models for data domains, for which there are no existing pre-trained models.Die Vorhersage eines Models hängt davon ab, welche Muster in den während des Trainings benutzen Daten vorhanden sind. Je größer die Menge an Trainingsdaten ist, desto wahrscheinlicher ist es, dass Grenzfälle in den Daten vorkommen. Je größer jedoch die Anzahl der zu lernenden Mustern ist, desto größer muss jedoch das Modell sein. Für einfache Anwendungsfälle ist es möglich ein kleines Modell in wenigen Minuten zu trainieren um bereits gute Ergebnisse auf Testdaten zu erhalten. Für komplexe Anwendungsfälle kann ein dementsprechend großes Modell jedoch bis zu mehrere Tage benötigen um ausreichend gut zu sein. Ein Extremfall für ein großes Modell ist das kürzlich veröffentlichte Modell mit dem Namen GPT-3, welches aus 175 Milliarden Parametern besteht und mit Trainingsdaten in der Größenordnung von 45 Terabyte trainiert wurde. Das Modell wurde trainiert Text zu generieren und ist in der Lage Nachrichtenartikel zu generieren, basierend auf einer groben Ausgangsbeschreibung. Solch ein Modell können nur solche Forscher entwickeln, die Zugang zu entsprechender Hardware und Datenmengen haben. Es demnach von Interesse Trainingsvorgehen dahingehend zu verbessern, dass auch mit wenig vorhandenen Ressourcen Modelle für komplexe Anwendungsfälle trainiert werden können. Diese Arbeit beschäfigt sich mit dem Vortrainieren von neuronalen Netzen. Wenn ein neuronales Netz auf einem Datensatz trainiert wurde und dann auf einem zweiten Datensatz weiter trainiert wird, lernt es die Merkmale des zweiten Datensatzes schneller, da es nicht von Grund auf Muster lernen muss sondern auf bereits gelerntes zurückgreifen kann. Man spricht dann davon, dass das Wissen transferiert wird. Während des Vortrainierens bekommt ein Modell häufig eine Aufgabe wie zum Beispiel, im Fall von Bilddaten, die Trainingsdaten erst zu komprimieren und dann wieder herzustellen. Bei Textdaten könnte ein Modell vortrainiert werden, indem es einen Satz als Eingabe erhält und dann den nächsten Satz aus dem Quelldokument vorhersagen muss. Solche Modelle bestehen dementsprechend aus einem Encoder und einem Decoder. Der Nachteil bei diesem Vorgehen ist, dass der Decoder lediglich für das Vortrainieren benötigt wird und für den späteren Anwendungsfall nur der Encoder benötigt wird. Zentraler Bestandteil in dieser Arbeit ist deswegen das Untersuchen der Vorteile und Nachteile der siamesische Modellarchitektur. Diese Architektur besteht lediglich aus einem Encoder, was dazu führt, dass das Vortrainieren kostengünstiger ist, da weniger Gewichte trainiert werden müssen. Der wesentliche wissenschaftliche Beitrag liegt darin, dass die siamische Architektur ausführlich verglichen wird mit vergleichbaren Ansätzen. Dabei werden bestimmte Nachteile gefunden, wie zum Beispiel dass die Auswahl einer Ähnlichkeitsfunktion oder das Zusammenstellen der Trainingsdaten große Auswirkung auf das Modelltraining haben. Es wird erarbeitet, welche Ähnlichkeitsfunktion in welchen Kontexten empfohlen wird sowie wie andere Nachteile der siamischen Architektur durch die Anpassung der Trainingsziele ausgeglichen werden können. Die entsprechenden Experimente werden dabei auf Daten aus unterschiedlichen Domänen ausgeführt um zu zeigen, dass der entsprechende Ansatz universell anwendbar ist. Die Ergebnisse aus konkreten Anwendungsfällen zeigen außerdem, dass die innerhalb dieser Arbeit entwickelten Modelle ähnlich gut abschneiden wie extern verfügbare Modelle, welche mit großem Ressourcenaufwand trainiert worden sind. Dies zeigt, dass mit Bedacht erarbeitete Architekturen die benötigten Ressourcen verringern können

    Implementing the Perkins V Career and Technical Education Act for the Advancement of STEM Careers Among Tribal Communities

    Get PDF
    Career and technical education courses, funded by a 2018 re-authorization known as the Strengthening Career and Technical Education for the 21st Century Act, provide an ideal opportunity for the advancement of science, technology, engineering, and mathematics (STEM) careers in the United States. The article takes a close look at the conditions and implications of the re-authorized Perkins V for the advancement and promotion of career and technical education courses among tribal communities

    Magnetic field dependence of the internal quality factor and noise performance of lumped-element kinetic inductance detectors

    Get PDF
    We present a technique for increasing the internal quality factor of kinetic inductance detectors (KIDs) by nulling ambient magnetic fields with a properly applied magnetic field. The KIDs used in this study are made from thin-film aluminum, they are mounted inside a light-tight package made from bulk aluminum, and they are operated near 150mK150 \, \mathrm{mK}. Since the thin-film aluminum has a slightly elevated critical temperature (Tc=1.4KT_\mathrm{c} = 1.4 \, \mathrm{K}), it therefore transitions before the package (Tc=1.2KT_\mathrm{c} = 1.2 \, \mathrm{K}), which also serves as a magnetic shield. On cooldown, ambient magnetic fields as small as approximately 30μT30 \, \mathrm{\mu T} can produce vortices in the thin-film aluminum as it transitions because the bulk aluminum package has not yet transitioned and therefore is not yet shielding. These vortices become trapped inside the aluminum package below 1.2K1.2 \, \mathrm{K} and ultimately produce low internal quality factors in the thin-film superconducting resonators. We show that by controlling the strength of the magnetic field present when the thin film transitions, we can control the internal quality factor of the resonators. We also compare the noise performance with and without vortices present, and find no evidence for excess noise beyond the increase in amplifier noise, which is expected with increasing loss.Comment: 5 pages, 4 figure

    Examining how companies’ support of tourist attractions affects visiting intentions: The mediating role of perceived authenticity

    Get PDF
    As public funding for the restoration of tourist attractions decreases, assistance is often sought from the private sector in the form of corporate social responsibility (CSR). However, research has yet to understand how such CSR activities impact the beneficiary, namely tourist attractions. Thus, extending past CSR literature, we explore whether differing company CSR motivations can influence a tourists’ visiting intentions. The results of two experimental studies show low company altruism (e.g., demanding to acquire naming rights of the site), compared to high company altruism (e.g., demanding nothing in return), decreases visiting intentions. Furthermore, we show that perceived authenticity of the site mediates this effect. Finally, we find the negative effect of low altruistic CSR is mitigated in the case of no heritage. Based on the results, we show tourist attraction managers should be wary of companies displaying nonaltruistic intentions, as such activity may have harmful consequences

    Genomic features of bacterial adaptation to plants

    Get PDF
    Author(s): Levy, A; Salas Gonzalez, I; Mittelviefhaus, M; Clingenpeel, S; Herrera Paredes, S; Miao, J; Wang, K; Devescovi, G; Stillman, K; Monteiro, F; Rangel Alvarez, B; Lundberg, DS; Lu, TY; Lebeis, S; Jin, Z; McDonald, M; Klein, AP; Feltcher, ME; Rio, TG; Grant, SR; Doty, SL; Ley, RE; Zhao, B; Venturi, V; Pelletier, DA; Vorholt, JA; Tringe, SG; Woyke, T; Dangl, JL | Abstract: © 2017 The Author(s). Plants intimately associate with diverse bacteria. Plant-associated bacteria have ostensibly evolved genes that enable them to adapt to plant environments. However, the identities of such genes are mostly unknown, and their functions are poorly characterized. We sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize. We then compared 3,837 bacterial genomes to identify thousands of plant-associated gene clusters. Genomes of plant-associated bacteria encode more carbohydrate metabolism functions and fewer mobile elements than related non-plant-associated genomes do. We experimentally validated candidates from two sets of plant-associated genes: one involved in plant colonization, and the other serving in microbe-microbe competition between plant-associated bacteria. We also identified 64 plant-associated protein domains that potentially mimic plant domains; some are shared with plant-associated fungi and oomycetes. This work expands the genome-based understanding of plant-microbe interactions and provides potential leads for efficient and sustainable agriculture through microbiome engineering

    Development of dual-polarization LEKIDs for CMB observations

    Get PDF
    We discuss the design considerations and initial measurements from arrays of dual-polarization, lumped-element kinetic inductance detectors (LEKIDs) nominally designed for cosmic microwave background (CMB) studies. The detectors are horn-coupled, and each array element contains two single-polarization LEKIDs, which are made from thin-film aluminum and optimized for a single spectral band centered on 150 GHz. We are developing two array architectures, one based on 160 micron thick silicon wafers and the other based on silicon-on-insulator (SOI) wafers with a 30 micron thick device layer. The 20-element test arrays (40 LEKIDs) are characterized with both a linearly-polarized electronic millimeter wave source and a thermal source. We present initial measurements including the noise spectra, noise-equivalent temperature, and responsivity. We discuss future testing and further design optimizations to be implemented

    A Multisite Preregistered Paradigmatic Test of the Ego-Depletion Effect

    Get PDF
    We conducted a preregistered multilaboratory project (k = 36; N = 3,531) to assess the size and robustness of ego-depletion effects using a novel replication method, termed the paradigmatic replication approach. Each laboratory implemented one of two procedures that was intended to manipulate self-control and tested performance on a subsequent measure of self-control. Confirmatory tests found a nonsignificant result (d = 0.06). Confirmatory Bayesian meta-analyses using an informed-prior hypothesis (δ = 0.30, SD = 0.15) found that the data were 4 times more likely under the null than the alternative hypothesis. Hence, preregistered analyses did not find evidence for a depletion effect. Exploratory analyses on the full sample (i.e., ignoring exclusion criteria) found a statistically significant effect (d = 0.08); Bayesian analyses showed that the data were about equally likely under the null and informed-prior hypotheses. Exploratory moderator tests suggested that the depletion effect was larger for participants who reported more fatigue but was not moderated by trait self-control, willpower beliefs, or action orientation.</p
    corecore